303
DOI: 10.1201/9781003355205-8
C h a p t e r 8
Shotgun Metagenomic
Data Analysis
8.1 INTRODUCTION
In the previous chapter, we discussed the amplicon-based metagenomic data analysis which
is based on the profiling of a single targeted gene, usually 16S rRNA gene in environmental
or clinical samples. Many researchers debate that approach is not metagenomic in nature
because it focuses only on a single gene rather than the entire genomes of the microbes
in the samples. In this chapter, we will discuss the shotgun sequencing metagenomic
approach which involves the sequencing of the entire genomes of the microbes in the sam-
ples, and therefore, it provides more insights onto the microbial communities, their genetic
profiling, and their impact on hosts and association to the host phenotype. The shotgun
sequencing for the metagenomes is rather new but it also emerged as a consequence of the
progress in the high-throughput sequencing technologies, which was also followed by the
progress in the development of the computational resources and tools that are capable to
handle the massiveness and complexity of the metagenomic sequencing data. The shotgun
whole-genome metagenomic sequencing and data analysis are now used to quantify the
microbial communities and diversity, to assemble novel microbial genomes, to identify
new microbial taxa and genes, and to determine the metabolic pathways orchestrated by
the microbial community and more.
The metagenomic raw data produced by a high-throughput sequencer is originated
from either environmental or clinical samples that contain multiple microbial organisms,
including bacteria, fungi, and viruses. Data originated from samples recovered from a
host may be contaminated with the host genomic sequences. Multiple samples can also
be sequenced in a single run (multiplexing). In the multiplex sequencing, unique barcode
sequences identifying each sample are ligated to the DNA fragments in the DNA library
preparation step. Some library preparation kits allow multiplexing of hundreds of samples.
Illumina has multiple kits for library preparation, including Illumina DNA Prep, (M) tag-
mentation, which uses bead-linked transposomes in the tagmentation process to randomly